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Abstract — This paper deals with the estimation of the modes 
of an univariate mixture when the number of components is 
known and when the component density are well separated. 
We propose an algorithm based on the minimization of the 
"kp" criterion we introduced in a previous work. In this paper 
we show that the global minimum of this criterion can be 
reached with a linear least square minimization followed by a 
roots finding algorithm. This is a major advantage compared 
to classical iterative algorithms such as K-means or EM which 
suffer from the potential convergence to some local extrema of the 
cost function they use. Our algorithm performances are finally 
illustrated through simulations of a five components mixture. 

Index Terms — 

univariate mixture, separated mixture components, multimodal 
estimation, non-iterative algorithm 
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I. Introduction 

In this paper we focus on the estimation of the modes of 
an univariate mixture with a known number of components. 
When the mixture component belongs to a parameterized 
family known by the estimator (gaussian mixture case for 
instance), the observation estimated moments can be mapped 
to the mixture parameters [1]. Yet, when the number of 
components is high, the relationships between the moments 
and the mixture parameters are usually too complicated to be 
analytically solved. Alternately, the Expectation-Maximization 
(EM) [2] algorithm is the most commonly used method when 
the mixture densities belong to a parameterized family. It is 
an iterative algorithm that look for the mixture parameters 
that maximize the likelihood of the observations. The EM 
iteration consists of two steps. The Expectation step estimates 
the probability for each observation to come from each mixture 
component. During the Maximization step, these estimated 
probabilities are used to update the estimation of the mixture 
parameters. If the mixture components do not belong to any 
parameterized family, or if the parameterized family is not 
known by the estimator, the moment method and the EM 
algorithm do not directly apply. Yet, if the mixture compo- 
nents density are identical and quite separated, any clustering 
methods can be used to cluster the data and calculate the 
clusters means to reach the mixture modes. A survey of the 
clustering techniques can be found in [3]. Among them, the 
K-means algorithm [4] is one of the most popular method. 



It is an iterative algorithm which groups the data into K 
clusters in order to minimize an objective function such as 
the sum of point to cluster mean square Euclidean distance. 
K-means alternately assign each data to the closest cluster 
center, compute the new clusters centers and calculate the 
resulting cost function. A data assignment is validated only 
if it decreases the overall cost function. The main drawback 
of K-means or EM is the potential convergence to some local 
extrema of the criterion they use. Some solutions consist for 
instance in using smart initializations ( [5] [6] for EM, [7] for 
k-means) or stochastic optimization, to become less sensitive 
in the initialization ( [8] [9] for EM, [10] for K-means). 
Another drawback of these methods is the convergence speed, 
which can be very slow when the number of observations is 
high. In this contribution, we propose a non-iterative algorithm 
which mainly consists in calculating the minimum of the "k- 
product" (kp) criterion we first introduced in [11]. The main 
motivation for using such criterion is that its minimization 
has a global solution which can be reached by a least square 
optimization followed by a roots finding algorithm. The paper 
is organized as follows: In section 2 the observation model is 
presented and the criterion is defined. In section 3 the criterion 
global minimum is theoretically calculated. In section 4 the 
mode estimation algorithm is described. Section 5 presents 
some simulations which illustrate the algorithm performances 
for a 5 components mixture and conclusions are given in 
Section 6. 

II. OBSERVATION MODEL AND CRITERION DEFINITION 

Let 5 be a discrete random variable taking its values in the 
set {a k }ke{i-K} ofR K with probabilities {n k } ke{1 ... K} and 
let v be a random zero-mean variable with probability density 
function g(v). The multimodal observation z is given by: 



z = 6 + v 



(1) 



A 



We call a the vector of the modes defined by a 
(ai,a2 ■ • -or-)*. We suppose that the a k are all distincts: the 
probability density of z, f(z), is then a finite mixture of K 
identical densities with the mixing weights {^k]k^{i - K}'- 



K 



f{z) = ^2n k g(z - a k ) 



(2) 



k=l 



2 



Let {zn} n e{i-N} be a set of N observations in Mr. In all 
the following we assume that N is superior to K and that the 
number of different observations is superior to K — 1. The kp 
criteria J(x) is defined by: 



J 



N K 

EIK* 

n=l fc=l 



Xk) 



(3) 



This criterion has been introduced in [1 1]. It is clearly positive 
for any vector x. The first intuitive motivation for defining this 
criterion is its asymptotic behavior when v is null. In this case, 
all the observations are equal to one of the ak and therefore 
J (a) = 0. J(x) is then minimal when x is equal to a or any 
of its K\ permutations. The second motivation is that, in the 
general case, J have K\ minima that are the K\ permutations 
of one single vector which can be reached with a linear least 
square solution followed by a roots finding algorithm. This is 
shown in section 3. 

III. KP GLOBAL MINIMUM 



We first provide in section IIII-AI some useful definitions 
which are needed in section lHI-Bl to reach the global minimum 
of J. 

A. Some Useful Definitions 

To any observation z n we associate the vector z„ defined 
by: 



(4) 



The vector z and the Hankel matrix Z are then respectively 
defined by: 



N 

z = S ' z n z n , z G 



n=l 
N 



(5) 



(6) 



71=1 



Let y = (j/i, ■ • ■ , UkY be a vector of R K . We define the 
polynomial of order K q y (a) as: 



K 



q y (a) = a K -J2a K k y k 



(7) 



fe=i 

if r = (n, • • • , Tk)* is a vector of containing the if roots 
of g y (o!) the factorial form of q y (ce) is: 



K 



Qy( a ) = Y[( a - r k) 



(8) 



where Wfc(r) is the Elementary Symmetric Polynomial (ESP) 
( [12]) in the variables n, • • • , rx defined by: 



= (-1)* 



E 



r h- r h--'- r 3k (H) 



{Jir-Jjefl-K}' 
3'i<-<Jk<K 



If we call w(r) the vector of ESP of r defined by: 
w(r) = (toi(r), • • ■ , w K {r)f 



(12) 



the relationship between the roots and coefficients of q y (a) 
becomes: 

y = w(r)^Vfce{l---if} g y (r fe )=0 (13) 

fi. The KP Minimum 

The main idea is to express J(x) as a function of w(x): 
using definitions and dT2b . the development of each term of 



the sum in J leads to J(x) —^2 



N 



..K 



— z^w(x)) . There- 



fore, the minization of J becomes a least square minimization 
in the variable w(x). The vector y min which minimizes 
Yl n —i { z n ~ z ny) can be easily obtained. Now if x m i n is 
a vector such as y mm = w(x mm ) and x min € R K , then x min 
is clearly a minimum of J. According to ([T3l l. x m j„ has to 
contain the K roots of q y . [a) to have y min = w(x m j„). 
The difficult part is to show that these K roots of q y (a) 
are always real: 

Theorem 1: if y mi „ is the solution of Z.y min = z (where 
Z and z have been defined in © and (O) and if x m j„ is a 
vector containing, in any order, the K roots of q y . (a), then 
x m i„ belongs to W K and x m ; n is the global minimum of J. 
Proof: Let F be the function defined by: 



F 



The restriction of F to 
observations z„ are real: 



N K 

*^^n\\Zn-X k \\l (14) 

71=1 fc = l 

is the function J since the 



Vx G 



F(x) = J(x) 



Now let ii be the function defined by: 



H 



N 

E 



n n 



(15) 



(16) 



The function H applied to the ESP of a vector x in C fe is 
equal to the function F applied to x: 



Vx G C K : F(x) = J2 



K 



Y\ (z n - x k ) 



fe=l 



(17) 



developping ( fTTT i using definition (fTTb leads to: 



q y (a) = a A -(riH Vr K )oc (r x xr 2 ■ ■ -xr K ) 

(9) 

K 

q y (a) = a K - ^ a K ~ k w k {r) (10) 



fe=i 



Vx G C K : F(x) = 



K 



K 



fc=i 



including definitions (O and ( fT2b : 



(18) 
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Vx e 



N 

E 

n=l 



f( X ) = >:h^-zXx)||^ 



Vx e 



F(x) = i/(w(x)) 



(19) 



(20) 



The global minimum of H is the linear least square solution 
Ymin 8 iven b y : 



argmin ■ 



' N 

El 

, n— 1 



.A" 




(21) 



developping d2"TT i using definitions (O and © and remember- 
ing that the coefficients of Z and z are real: 



y m m = argmin {y^ z y - 2Re{y ff }z} 



z.y. 



z, 



(23) 



The Hankel matrix Z is regular since the number of dif- 
ferent observations is superior to K — 1 [13]. Sytem (l23l 
therefore have exactly one solution. Since Z belongs to 
R KxK and z belongs to R K , y min belongs to R K . Now let 
w(a:i,mm, • • • ,XK,miny be a vector containing, in any 
order, the if (potentially complex) roots of q y (a). One can 
show that the following holds: 

(i) x m i n is a global minimum of F 

(ii) x mi „ G M K 

(iii) x m i„ is a global minimum of J 
Property (i) is a direct consequence of (1201 1 : 



Vx G 

Vx g 
Vx g 

According to (1131 , y 

Vx e C 
Vx e 



min 
K . 



F(x) = H(w(x)) (24) 

F(x) > min{iJ} (25) 

F(x) > H{y min ) (26) 
= w(x m i„) and we have: 

F(x) > H(w(x min )) (27) 

F(x) > F{x mm ) (28) 



which proves (i). Property (ii) can be shown by contradiction: 
if x m i n does not belong to M. K , then for one of the Xf,, m in we 
have Xk,min ^ Rel^fc.mm} and, since all the observations z n 
are real: 



Vn G {!,-■■ ,N} 
which leads to: 



(29) 



(30) 



This is impossible SinCC ^-rnin IS cl global minimum of _F. 
This proves property (ii). We finally have to prove (iii): since 



pA' 



we have, using (fTBT i: 



Furthermore, according to ( 115) : 

Vx G E K : J(x) = F(x) 
Vx G M A ' : J(x) > mm{F} 



(31) 

(32) 
(33) 



TABLE I 

KP ALGORITHM STEPS AND COMPLEXITIES 

step 1: calculate a minimum of J 

calculate Z and z: o(NK) 
calculate y min by solving (23V o(A" 2 ) 
calculate the roots (x ltmin , ■ ■ ■ ,x K ,min) of qy min (a): o{K 2 ) 
step 2: clustering and mode estimation 
assign each z n to the closest m i„: o(NK) 
calculate the K means of the resulting clusters: o(7V) 



then, according to property (i): 



(22) usln 8 



(HQ: 



Vx G 



Vx G 



pA 



^(x) > F(x min ) 



J(x) > J{x min ) 



(34) 



(35) 



which proves (iii). Properties (ii) and (iii) directly lead to 
theorem 1. ■ 

IV. MODES ESTIMATION ALGORITHM 

The mode estimation algorithm consists of two steps. In 
the first step, the minimum of J, {xi t min, XK,min}, is 
calculated, giving a first raw estimation of the set of modes. In 
the second step, each observation z n is assigned to the nearest 
estimated mode, K clusters are formed, and the final set of 
estimated modes is given by the means of the K clusters. 
The algorithm steps and their complexities are illustrated in 
table J] It appears from table Q] that the global complexity is 
in o(NK + K 2 ), which is equivalent to o(NK) since N is 
superior to K. 

V. SIMULATION 

For each simulation run, a set of N = 100 observa- 
tions is generated from the mixture described in ([T]) with 
K = 5 modes, a = (0,1,2,3,4)* and n k = 0.2 for all 
k in {1, • • • ,K}. The density of v is a zero-mean Laplace 

A 2 
distribution given by g(v) = — e~ A '' u ' with a variance — = 

_„ 2 A 

10 . This leads to well separated mixture component; the 

observation multimodal pdf is shown in figure 1 . We suppose 

that the form of g(v) is not known by the estimator. Therefore 

a moment matching method or the EM algorithms would 

not directly apply. The kp algorithm is then compared to 

the K-means algorithm. The K-means algorithm is randomly 

initialized and the used cost function is the point to cluster 

mean square Euclidean distance. K-means is stopped when 

the cost function no longer decreases. The number of modes 

is supposed to be known in each method. 10000 runs have been 

performed. To get rid of the permutation ambiguity, for each 

run r, the estimated mode (a r ) accuracy is characterized by 

the maximal absolute distance D r between the sorted vector 

of mode and the sorted vector of estimated mode: 



D r = AT(sort(a) - sort(a r )) 



(36) 



where iV(x) = max \xk\. The distribution of D r is given 

ke{i-K} 

in figure 2. With the K-means algorithm, D r is inferior to 



4 



observation pdf 



2.5- 



2 - 



1.5- 



1 - 




-1 1 2 3 4 5 

observation 

Fig. I. probability density function of a five components univariate mixture 
with components means {0,1,2,3,4}, mixing weight {0.2,0.2,0.2,0.2,0.2} and 

A 2 
common component density g(v) = — e~ A H with variance — = f0~ 2 . 



distribution of D f for 1 0000 runs ot 1 00 observations 




Fig. 2. comparison of kp (*) and k-means (o) algorithm applied to the five- 
component mixture described in figure 1; for each method, 10000 runs of 100 
observations have been generated; for each run, the performance criteria, D r , 
is the maximal absolute distance between the sorted vector of modes and the 
sorted vector of estimated modes. 



0.1 for 41.3% of the run and inferior to 0.2 for 41.4% of 
the run. Yet, for 58.6% of the run, D r is superior to 0.7, 
which corresponds to a poor estimation of the set of modes. 
In this case the K-means method has converged to a local 
minimum of its cost function. Typically, one estimated mode 
is located in the middle of two true modes (gathering two true 
clusters) while two other estimated modes are closed to the 
same true mode. In this configuration of estimated modes, D r 
is around 1. On the contrary, the kp algorithm always provides 
an accurate set of estimated modes: D r remains inferior to 0.1 
for 98.7% of the run and D r remains inferior to 0.2 for 99.6% 
of the run. 



VI. CONCLUSION 

We have provided a global minimum of the new "kp" crite- 
rion we first introduced in [11] and used it for the estimation of 
the modes of univariate mixture whose component density are 
common and well separated. The form of the mixture densities 
does not have to be known by the mode estimator and does not 
have to belong to any particular parameterized family. Simu- 
lations have illustrated the kp algorithm good performances in 
the case of an univariate mixture of five Laplace distributions. 
In particular, the simulations have shown the superiority of our 
algorithm to the K-means algorithm which often converge to 
local minima of the used cost function. The generalization to 
multivariate mixture is now being studied, as well as the use 
of the kp criteria for estimating the number of components in 
a mixture. 
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